Adaptively Scheduling Parallel Loops in Distributed Shared-Memory Systems
نویسندگان
چکیده
Using runtime information of load distributions and processor affinity, we propose an adaptive scheduling algorithm and its variations from different control mechanisms. The proposed algorithm applies different degrees of aggressiveness to adjust loop scheduling granularities, aiming at improving the execution performance of parallel loops by making scheduling decisions that match the real workload distributions at runtime. We experimentally compared the performance of our algorithm and its variations with several existing scheduling algorithms on two parallel machines: the KSR-1 and the Convex Exemplar. The kernel application programs we used for performance evaluation were carefully selected for different classes of parallel loops. Our results show that using runtime information to adaptively adjust scheduling granularity is an effective way to handle loops with a wide range of load distributions when no prior knowledge of the execution can be used. The overhead caused by collecting runtime information is insignificant in comparison with the performance improvement. Our experiments show that the adaptive algorithm and its five variations outperformed the existing scheduling algorithms.
منابع مشابه
Evaluation of Loop Scheduling Algorithms on DistributedMemory Systems
Loops are the largest source of parallelism in many applications. All prior DOALL loop scheduling algorithms such as Self-Scheduling, Guided Self-Scheduling, Trapezoid Self-Scheduling, and Factoring try to achieve workload balance through decreasing chunk sizes. Moreover, they have been analyzed only for shared memory platforms. In this work, the prior loop scheduling methods will be evaluated ...
متن کاملLoad Balancing for Parallel Loops in Workstation Clusters
Load imbalance is a serious impediment to achieving good performance in parallel processing. Global load balancing schemes cannot adequately manage to balance parallel tasks generated from a single application. Dynamic loop scheduling methods are known to be useful in balancing parallel loops on shared-memory multiprocessor machines. However, their centralized nature causes a bottleneck even fo...
متن کاملSimple Code Generation for special UDLs
This paper focuses on transforming sequential perfectly nested loops into their equivalent parallel form. A special category of FOR nested loops is the uniform dependence loops (UDLs), which yield efficient parallelization techniques. An automatic code generation tool for shared and distributed memory machines, has been developed in order to automatically parallelize these perfectly nested loop...
متن کاملScheduling User-Level Threads on Distributed Shared-Memory Multiprocessors
In this paper we present Dynamic Bisectioning or DBS, a simple but powerful comprehensive scheduling policy for user-level threads, which unifies the exploitation of (multidimensional) loop and nested functional (or task) parallelism. Unlike other schemes that have been proposed and used thus far, DBS is not constrained to scheduling DAGs or singly nested parallel loops. Rather, our policy enco...
متن کاملThe Impact of Parallel Loop Scheduling Strategies on Prefetching in a Shared Memory Multiprocessor
Trace-driven simulations of numerical Fortran programs are used to study the impact of the parallel loop scheduling strategy on data prefetching in a shared memory multiprocessor with private data caches. The simulations indicate that to maximize memory performance it is important to schedule blocks of consecutive iterations to execute on each processor, and then to adaptively prefetch singlewo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IEEE Trans. Parallel Distrib. Syst.
دوره 8 شماره
صفحات -
تاریخ انتشار 1997